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Abstract —When executing whole-body motions, humans are 
able to use a large variety of support poses which not only 
utilize the feet, but also hands, knees and elbows to enhance 
stability. While there are many works analyzing the transitions 
involved in walking, very few works analyze human motion 
where more complex supports occur. 

In this work, we analyze complex support pose transitions 
in human motion involving locomotion and manipulation tasks 
(loco-manipulation). We have applied a method for the detection 
of human support contacts from motion capture data to a large- 
scale dataset of loco-manipulation motions involving multi¬ 
contact supports, providing a semantic representation of them. 
Our results provide a statistical analysis of the used support 
poses, their transitions and the time spent in each of them. In 
addition, our data partially validates our taxonomy of whole- 
body support poses presented in our previous work. 

We believe that this work extends our understanding of 
human motion for humanoids, with a long-term objective 
of developing methods for autonomous multi-contact motion 
planning. 


I. Introduction 

While efficient solutions have been found for walking in 
different scenarios [1], [2], including rough terrain and going 
up/down stairs, humanoid robots are still not able to robustly 
use their arms to gain stability, robustness and safety while 
executing locomotion tasks. 

Robotics has approached this problem from a computa¬ 
tional point of view ([3], [4], [5], [6], [7]). However, due to 
the complexity of the problem, these methods are still not 
completely successful. In this work, we propose to take a step 
back to analyze human motion in order to gain understanding 
of the processes humans make when using multi-contacts. 

Robotics in general, but particularly humanoid robotics, 
has always been inspired by biological human experience 
and the anatomy of the human body. However, human 
motions involving support contacts have almost not been 
studied [8] and even less how healthy subjects choose to 
make use of contacts with support surfaces. Works like [9] 
show that in a standing posture, reaching for a support 
contact provides augmented sensory information, reducing 
sway even if it is just through a ’’light touch”. This shows that 
the ability of reaching for supports can be crucial to increase 
robustness in tasks that require balance like walking or 
general locomotion, but also for increasing maneuverability 
in complex manipulation tasks. Nevertheless, to execute such 
tasks in an autonomous way, we need to better understand 
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Fig. 1. When performing locomotion (a), manipulation (b), balancing (c) 
or kneeling (d) tasks, the human body can use a great variety of support 
poses to enhance its stability. Automatically detecting such support contacts 
allows for an automatic identification of the visited support poses and their 
transitions. 


the principles of whole-body coordination in humans, the 
variety of supporting whole-body postures available and how 
to transition between them. 

In this work, we analyze real human motion data cap¬ 
tured with a marker-based motion capture system and post- 
processed using our unifying Master Motor Map (MMM) 
framework [10], [11], to gain information about the poses 
that are used while executing different locomotion and ma¬ 
nipulation tasks like those shown in Fig. 1 The analysis 
presented allows us to quantify the amount of time spent in 
each pose, classify transitions depending on their duration, 
and build a graph of pose transitions that can enlighten 
the difficult problem of finding motions that utilize multi¬ 
contacts to balance. 


This paper is organized as follows. Section II briefly 
reviews related works. In |Section HI] we introduce our 
methodology to detect support poses. In Section IV| we apply 
our method to a large set of motions and analyze the resulting 
data. Finally, |Section V| summarizes our contributions and 
gives prospects of future work. 

































II. Related Work 


There are very few works studying how we transition 
between different support poses when performing move¬ 
ments. Contributions analyzing human data focus on specific 
postures for specific tasks, such as how to optimally hold on 
a handrail on a moving vehicle [12] or use hand support 
to better resist perturbations [13]. However, other areas 
of robotics like grasping have greatly benefited from the 
study of human hand poses. Three decades ago, they started 
analyzing human data to simplify the space of possible grasps 
[14], [15]. Works like [16], [17] show that, although the hand 
posture space is highly dimensional, the majority of useful 
grasps can be described by a small number of discrete points 
in this space. Grasp taxonomies have been very relevant 
and successful [14], [18], [19], providing a wide variety of 
applications in grasp synthesis and autonomous grasp plan¬ 
ning. Only a few works have tried to extend these concepts 
to whole-body motion [20], In our recent work [21], we 
have proposed a taxonomy of whole-body poses that use the 
environment to balance based on a combinatorial approach of 
all the possible contacts using humanoid limbs. The current 
work provides a partial validation of the proposed taxonomy 
using real human data. 

From the robotics community, there has been significant 
interest in improving balance control procedures beyond the 
double foot support [3], [22] with efficient path planners [23], 
[6] that need to solve computationally costly optimization 
problems under constraints [5], However, these solutions are 
still not optimal, as planners are either very computationally 
costly [6], [23], [7] or only locally optimal and not applicable 
in an autonomous way [3], [4], [5], Solutions to deal with 
the autonomous decision making [24], [25], [26] provide 
interesting approaches but still do not scale to complex 
scenes. Each of these layers of the problem is a demanding 
problem on its own and the connection between all of them 
remains as one of the future challenges in robotics. 

Our work relates to this literature in the long-term goal, 
but we want to approach the problem from a different point 
of view. Relying on human motion provides us with many 
transition motions that can be transferred to robots [27], 
[28], [29] and stored as, e.g., dynamic movement primitives 
(DMP) [30]. There have been many works on DMPs for 
the whole body, showing that they can be adapted to differ¬ 
ent situations and sequenced [31], [32], [33], Other works 
do motion synthesis [34], [35], usually based on different 
segmentation techniques. There has been extensive work on 
these segmentation techniques for human motion [36], [37], 
[38], [39], [40], 

Existing works to detect support contacts use video data 

[41] , tracking algorithms to estimate ground reaction forces 

[42] , markers attached to the shoes to detect only floor 
contacts [43] or minimal oriented bounding boxes to detect 
links in contact without assuming environmental knowledge 
[44], 



Fig. 2. Setup used for motion capture. 


III. Detection of Whole-Body Poses and 
Segmentation 

A. Motion Acquisition 

We captured 121 human motions using an optical marker- 
based Vicon MX motion capture system with 10 cameras. 
A total of 56 passive reflective markers were attached to 
the human subject at characteristic anatomical landmarks. 
Subject was asked to perform different whole-body motions, 
described in Table II Fig. 2| illustrates the setup used for 
motion capture. Details about the procedures used for motion 
acquisition, e.g. the marker set, can be found in [11] and 
online^ After recording, the human motions were normalized 
and post-processed as described in Section HI-B| 

In addition to the human motion, we also captured the po¬ 
sition and movement of objects and environmental elements. 
To allow the reconstruction of object trajectories, a minimum 
number of three additional markers were placed on each 
object in a non-collinear manner. Using manually created 
object models, object trajectories can then be estimated from 
the marker trajectories which allow the analysis of interaction 
between the human subject and these environmental entities 
[40]. The KIT Whole-Body Human Motion Database [11] 
contains a large set of motions using this approach, providing 
raw motion capture data, corresponding time-synchronized 
video recordings and processed motions. 


B. Motion Processing 

The Master Motor Map (MMM) [45], [10] provides 
an open-source framework for capturing, representing and 
processing human motion. It includes a unifying reference 
model of the human body for the capturing and analysis 
of motion from different human subjects. The kinematic 
properties of this MMM reference model are based on 
existing biomechanical analysis by Winter [46] and allow 
the representation of whole-body motions using 104 degrees 
of freedom (DoF): 6 for the root pose, 52 for the body torso, 
extremities and head, and 2-23 for the hands. For the analysis 
in this work, we have excluded the hand joints. 

To be able to extract semantic knowledge from the 
recorded motions, we first need to transfer these motions 
to the MMM reference model, i.e. reconstruct joint angle 
trajectories from the motion capture marker trajectories in 
Cartesian space. For this purpose, for every motion capture 


’https://motion-database.humanoids.kit.edu/ 
marker_set/ 



















TABLE I 

Evaluation of error of segmentation method 


Description 

# Motions 

Av. # Poses 

Av. # Incorrect 

Av. # Missed 

Notes 

Locomotion tasks 

downstairs w. handle 

10 

12.3 

0.1 

2.5 

m: d.f.s. i: lost hand support 

upstairs w. handle 

19 

17.05 

0.26 

0.16 

m: d.f.s. i: lost hand supports 

upstairs, turn and downstairs 

7 

29.714 

0.143 

4 

m: d.f.s. i: lost hand support 

walks w. hand sup. to avoid obst. 

5 

13.2 

0.4 

2.2 

m: d.f.s. i: lost hand support 

walk over beam w. handle 

5 

19.4 

0.2 

0 


Loco-Manipulation tasks 

kick box with foot w. hand sup. 

6 

12.5 

0.33 

1.167 

m: d.f.s. i: lost hand support 

lean to place a cup on table 

6 

15.33 

0.17 

0 

i: incorrect foot support 

lean to pick a cup on table 

5 

5 

0 

0 


lean to pick a cup in air 

7 

15 

0.14 

0 

i: lost hand support 

lean to wipe 

6 

12.5 

0.5 

0 

m: d.f.s. at start 

bimanual pick and place 

6 

13.833 

0.833 

0.667 

m: d.f.s. i: lost foot support 

pick up from floor w. hand sup. 

3 

4.67 

0.67 

0 

i: extra hand support . 

Balancing tasks 

push rec. fr. behind push w. lean 

5 

6.2 

0 

0 


push rec. fr. left push w. lean 

9 

9.3 

0.11 

0 


inspect show sole w. sup. 

2 

11 

0 

0 


rec. fr. lost balance on 1 leg 

5 

10.8 

0 

0.2 


lean on table w. hands 

4 

16.25 

1.25 

0 

i: lost hand support 

Kneeling tasks 

kneel down 

4 

8 

0 

0 


kneel up 

7 

7.857 

0 

0 


Totals 

121 

239.94 

5.11 

10.89 


Percentages 



2.13% 

4.53% 



Abbreviations: av. = average, w. = with, sup. = support, obst. = obstacle, fr. = from, rec. = recovery, i: incorrect, m: missed, d.f.s. = double foot support 


marker on the human subject, we place one corresponding 
virtual marker on the reference model. 

Let U = (u!,...,u n ) be an observation of the 

3D positions of the n captured markers and x = 
( Px,Py,Pz , Ct, /?, 7,01,..., 9m) the vector describing the pose 
of the reference model, consisting of the root position and 
rotation of the model and its m joint angle values. Addi¬ 
tionally, let V(x) = (vy (x),..., v n (x)) be the positions of 
corresponding virtual markers as determined by the forward 
kinematics of the model. The problem of determining the 
pose of the MMM reference model for a given marker 
observation U is then solved by minimizing 

/(*) = X ^ Ui_v *( x )) 2 

i 

while maintaining the box constraints for given 

by the joint limits of the reference model. For every motion 
frame, this optimization problem is solved by using the 
reimplementation of the Subplex algorithm [47] provided 
by the NLopt library [48] for nonlinear optimization. Poses 
of objects involved in a motion are reconstructed from 
object markers in a similar way by using a joint-less six¬ 
dimensional pose vector. 

C. Extraction of Whole-Body Poses 

Support poses of the human subject are detected by 
analyzing the relation of the MMM reference model to 
the floor and environmental elements. For this purpose, we 
only consider objects which exhibit low movement during 
the recorded motion as suitable environmental elements to 
provide support. For every motion frame, we use the forward 


kinematics of the reference model to calculate the poses of 
the model segments that we consider for providing supports. 
These model segments represent the hands, feet, elbows and 
knees of the human body. 

A segment s of the reference model is recognized as a 
support if two criteria are fulfilled. First, the distance of s 
to an environmental element must be lower than a threshold 
Sdist(s). Distances to environmental elements are computed 
as the distances between pairs of closest points from the 
respective models with triangle-level accuracy using Simox 
[49], Additionally, the speed of segment s, computed from 
smoothed velocity vectors, has to stay below a threshold 
S ve i{s) for a certain number of frames, starting with the 
frame where the support is first recognized. The thresholds 
are chosen empirically: S ve i = 200^^, Sdist(Feet) = 
5dist(Hands) = 15 mm, Sdist(Knees) = 35 mm and 
Sdist(Elbows) = 30 mm. 

The support pose is defined by the contacts that are 
providing support to the subject. We ignore parts of the 
motion where the human body is not supported at all as 
an empty support pose, e.g. during running. Also, some 
practical assumptions are used, such as that a knee support 
also implies a foot support. 


The video attachment shows some of the motions that were 
part of our evaluation along with detected support contacts 
and the resulting support poses. We have manually validated 
the segmentation method error by exploring frame by frame 


the detected support segments, showing the results in Table I 


They show that about 4.5% of the poses are missed, but 
the missed poses are always double foot supports (with 


















































TABLE II 

Percentages of appearances and time spent for each transition (%appearance, %time) 



lFoot 

lFoot-1 Hand 

2Feet 

2Feet-1 Hand 

2Feet-2Hands 

lFoot-2Hands 

Totals x pose 

lFoot 

4.38%, 5.69% 

9.30%, 7.90% 

22.90%, 25.56% 

0.15%, 0.26% 

- 

0.08%, 0.04% 

36.81%, 39.44% 

1 Foot-1 Hand 

9.15%, 13.64% 

1.81%, 2.26% 

0.08%, 0.03% 

12.24%, 16.59% 

0.08%, 0.02% 

0.15%, 0.02% 

23.51%, 32.57% 

2Feet 

16.02%, 10.05% 

0.15%, 0.04% 

X 

3.48%, 2.23% 

0.08%, 0.06% 

- 

19.73%, 12.38% 

2Feet-lHand 

0.23%, 0.07% 

11.72%, 4.38% 

4.61%, 5.31% 

X 

0.98%, 0.15% 

- 

17.54%, 9.92% 

2Feet-2Hands 

- 

- 

- 

0.83%, 1.22% 

X 

0.68%, 0.75% 

1.51%, 1.97% 

lFoot-2Hands 

- 

0.53%, 1.27% 

- 

- 

0.38%, 2.45% 

X 

0.91%, 3.72% 


or without hand). Only 2.1% of the poses are incorrectly 
detected. 


IV. Results 

A. Statistical Analysis of the Detected Poses and Their 
Transitions 

Without taking into account kneeling motions, we have 
recorded and analyzed 110 motions including locomotion, 


loco-manipulation and balancing tasks listed in Table I In 
this section, we present some analysis on the most common 
pose transitions and the time spent on them. We ignore 
kneeling motions because we do not have enough data 
yet to get significant results. In every motion, both the 
initial and the final pose are double foot supports and the 
time spent on these poses is arbitrary. Therefore, they have 
been ignored for the statistical analysis. Without counting 
them, we have automatically identified a total of 1323 pose 
transitions lasting a total time of 541.48 seconds (9.02 min). 
In |Table III each cell represents the transition going from the 
pose indicated by the row name to the pose indicted by the 
column name. In each cell, we show first the percentage of 
occurrence of the transition with respect to the total number 
of transitions detected, and secondly the percentage of time 
spent on the origin pose before reaching the destination pose, 
with respect to the total time of all motions. The last column 
is the accumulation of percentages per each pose, and the 
rows are sorted from the most to the least common pose. 

It must be noted that the loop transitions lFoot—>■ lFoot, 
and lFoot-lHand—^lFoot-lHand are mostly missed double 
foot supports and we will not include them in the analysis. 


According to Table II the most common transitions are 
lFoot—>2Feet (22.90% of appearance) and 2Feet—HFoot 
(16.02% of appearance). These are the same transitions of 
walking that have been widely studied. Winter reported in 
[50] that depending on slow or fast walking, the interval 
of the time spent on 2Feet—>■lFoot (double foot support) is 
11-19 frameTj while for 1 Foot—>2Feet (single foot support) 
it is 38-52 frames. Although all our motions contain some 
steps of normal walking, they also involve hand supports, and 
therefore, these transitions may show different time behaviors 
if they are part of a more complex set of transitions. We are 
interested in observing similar long and short locomotion 
transitions, but involving other poses. In addition, we find a 


-All times are measured in frames, with motions recorded at 100 FPS. 


third type of transition that usually lasts longer because it is 


supporting a manipulation task. The transitions in Table II 


where the time spent is proportionally larger than their fre¬ 
quency can give us the intuition that they may be either long 
locomotion transitions or support for manipulation tasks. 


B. Analysis of the Time Spent per Transition 

To study the time spent in each transition in more detail. 
Fig. 3| shows the histograms of time spent in the most 
common pose transitions. In yellow, we show all transitions 
involved in locomotion tasks, and we can observe that the 
histograms in (a), (b) and (f) show bimodal distributions. 
For the first two cases, we could fit a mixture distribution 
of 2 normals, with parameters N(p = 11.76, a = 8.60) 
and N(p = 53.89, a = 11.35) for the plot (a) and N(p = 
15.9905, cr = 9.6776) and N(p = 55.8507, a = 8.7216) for 
the plot (b), with a confidence probability of 0.969 and 0.944 
respectively. This indicates that the lFoot—>2Feet transition 
can play the role of a long locomotion transition with mean 
53 frames, but can also be a short transition with times 
around 11 frames, and similarly for (b). For the plot (f), 
we could fit the two normals N(p = 36.47, a = 26.59) and 
N(p = 73.11, a = 8.319), but with only 0.80 of confidence. 
We need more data to verify these mean values. Still, 
inspecting the histogram it is clear that the transition lFoot- 
lHand—>2Feet-lHand can act as a long transition with mean 
times of around 70 frames Other transitions like (c), (d), (g) 
and (h) are clearly short transitions. Plot (c) corresponds to 
2Feet—HFoot (the usual walking double foot support) that 
for locomotion tasks is clearly on the short duration, with 
76.8% of the cases below 20 frames (91% below 30). 

In blue, we show the loco-manipulation tasks. These 
tasks include walking, but also transitions for supporting 
the manipulation task. Note that transitions to support the 
manipulation are not very frequent because there is only one 
per motion, while transitions for walking are the majority, 
shown in plots (a) and (c). As expected, (a) shows long 
locomotion transition types, while (c) shows short ones. 
However, in (c) we see some long-lasting poses. Inspecting 
the data task by task, we see that these happen in the 
bimanual pick and place of the big box, because the double 
foot pose supports the action of crouching to pick up the 
box. In the remaining plots, we can see other transitions 
supporting manipulation. For instance, the ones in plots (e) 
and (f) correspond to some of the motions of leaning to 
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Fig. 3. Histograms showing the occurrences of frames spent in each transition, 
to clarify the different x axis scales. 
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Fig. 4. Transition graph of whole-body pose transitions automatically generated from the analyzed motions. Labels on edges indicate the number of 
transitions found of each type. 


wipe, reach or place, where the subject uses a lFoot-lHand 
to perform the task (as in Fig. 1 -(b)), while in other lean 
actions, the subject uses the 2Foot-lHand pose, shown in 
plot (g). 

Finally, balancing tasks are plotted in red. They consist 
mostly of very fast transitions because motions are very 
fast, especially after pushes. As before, plots (a) and (c) 
accumulate the poses of the walking parts of the motions. 
In (b), the 200 frames lasting transitions correspond to the 
balancing on one foot, lasting until the subject loses balance 
and needs to lean with the hand. After that, the lFoot- 
lHand pose is used until balance is recovered, showing 
as long transitions in plot (e). Other transitions supporting 
a task in plot (e) correspond to inspecting the shoe sole, 
that is supported by a 1 Hand-1 Foot pose. We do not show 
the histograms containing four supports because we do not 
have enough instances of them, but they all happen during 
balancing tasks. 

The motions recorded for this work can be found in the 
KIT Whole-Body Human Motion Database [llj] 


3 See https://motion-database.humanoids.kit.edu/ 
details/motions/<ID>/ with ID G {383, 385. 410, 412, 415, 456, 
460, 463, 515, 516, 517, 520, 521, 523, 527, 529, 530, 531, 597, 598, 599, 
600, 601, 604, 606, 607}. 


C. Data-Driven Generation of a Transition Graph of Whole- 
Body Poses 


In [21], we proposed a full taxonomy of whole-body 
support poses that was based on a combinatorial approach 
considering all the relevant contacts with the body. The 
current work has been inspired by our taxonomy, and one of 
its objectives is to validate the transitions that we proposed. 
However, we can only provide a partial validation, because 
our theoretical taxonomy contained poses with holds (like 
when hands grasp a handle) and also arm and forearm 
supports, that do not appear in any of the motions analyzed 
here. Also, we need more data on kneeling poses to reach 
more of the four support poses. 


Fig. 4 shows the automatically generated transition graph. 


considering also the start and end poses of each motion. 
Each edge corresponds to a transition, and their labels to 
the number of times we have found it. Edges plotted in red 
correspond to transitions where two simultaneous changes 
of contacts occur. In our theoretical taxonomy [21], we 
assumed that only one change of support should be allowed 
per transition. While this is still desirable for robotics, it is 
also obvious that some human transitions involve two contact 
changes. For instance, in push recovery motions, humans 






















































Fig. 5. Output of the segmentation for one of the motions upstairs with handle. The segment shown in red represents the initial pose transition, that 
has an arbitrary length. Blue segments represent transitions where the foot swings. Blue labels indicate transition durations. We can see that the human 
alternates between single foot support swing and 1 Foot-1 Hand support swing using the handle. 


usually lean on the wall using both arms at the same time 
to increase security and robustness. Many of the red edge 
transitions in |Fig. 4| occur in balancing tasks. 

In the transition graph shown in |Fig. 4| we can quickly see 
that red edges are of significantly lower frequency than the 
black ones, except the loop edges in the lFoot and lHand- 
lFoot poses, that are caused by either jumps or missed 
double foot supports. They correspond to the 4.5% transitions 


missed by our segmentation method reported in Table I 


This data-driven transition graph is influenced by the type 
of motions we have analyzed, using only one handle or one 
hand support. Only balancing poses reach the four support 
poses. In future work, we will analyze walking motions with 
handles on both sides. 


Fig. 5 shows the timeline of a motion where the subject 


goes upstairs using a handle on his right side. In blue, we 
show the long locomotion transitions. The supporting pose 
for these transitions alternates between 1 Foot-1 Hand, used 
to swing forward the foot not in contact, and lFoot, used 
to swing forward both the handle hand and the foot not in 
contact. This is because we only provide one handle. Another 
interesting thing to notice is that the short locomotion tran¬ 
sitions appear in clusters, composed by a sequence of two 
transitions. We have observed this in many of the motions 
and we have observed that the order of the transitions 
inside these clusters does not matter, just the start and end 
poses. We believe that each cluster could be considered as 
a composite transition where several contact changes occur. 
As future work, we want to detect and model these clusters 
to identify rules that allow us to automatically generate 
sequences of feasible transitions according to extremities 
available for contacts. 


V. Conclusions and Future Work 

We have presented an analysis of support poses of more 
than 100 motion recordings showing different locomotion 
and manipulation tasks. Our method allowed us to retrieve 
the sequence of used support poses and the time spent in 
each of them, providing segmented representations of multi¬ 
contact motions. 

Although the most common pose transitions are the ones 
involved in walking, we have shown that the 1 Foot-1 Hand 
and the 2Foot-lHand poses also play a crucial role in multi¬ 
contacts motions. We have classified our data into short and 
long locomotion transitions and transitions for supporting a 


task, depending on the time spent on them. We have observed 
that very short locomotion transitions are found in clusters 
that can be grouped as complex transitions with more than 
one contact change. The data-driven generated taxonomy 
validates the transitions proposed in our previous work. We 
believe that our motions segmented by support poses and 
time spent per transition provides a meaningful semantic 
representation of a motion. 

This work opens the door to many exciting future di¬ 
rections. First, we are interested in analyzing our motion 
representations to find semantic mles that can help define 
new motions for different situations, with the objective of 
building a grammar of motion poses. Storing each transition 
as motion primitives, we are also interested in performing 
path planning at a semantic level based on support poses. 

Finally, we are still assuming very simplified poses that 
do not consider directions of support, represented by simple 
sketch figures. However, for each class of poses there is an 
infinite number of possible body configurations depending on 
location and orientation of contacts. Future work directions 
include finding the most relevant whole-body eigen-grasps, 
that is, we will perform principal component analysis to 
reduce the dimensionality of the space that can realize each 
pose. 

In conclusion, this work presents a step further in the 
comprehension of how humans can utilize their bodies to 
enhance stability for locomotion and manipulation tasks. 
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